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Abstract 

In a previous report we have evaluated analytically the mutual information between the firing 
rates of N independent units and a set of multi-dimensional continuous+discrete stimuli, for a 
finite population size and in the limit of large noise. Here, we extend the analysis to the case of 
two interconnected populations, where input units activate output ones via gaussian weights and 
a threshold linear transfer function. We evaluate the information carried by a population of M 
output units, again about continuous+discrete correlates. The mutual information is evaluated 
solving saddle point equations under the assumption of replica symmetry, a method which, by 
taking into account only the term linear in N of the input information, is equivalent to assuming 
the noise to be large. Within this limitation, we analyze the dependence of the information on 
the ratio M/N, on the selectivity of the input units and on the level of the output noise. We 
show analytically, and confirm numerically, that in the limit of a linear transfer function and of a 
small ratio between output and input noise, the output information approaches asymptotically the 
information carried in input. Finally, we show that the information loss in output does not depend 
much on the structure of the stimulus, whether purely continuous, purely discrete or mixed, but 
only on the position of the threshold nonlinearity, and on the ratio between input and output noise. 
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I. INTRODUCTION 



Recent analyses of extracellular recordings performed in two motor areas of behaving 
monkeys have tried to clarify how information about movements is trasmitted and received 
from higher to lower stages of processing, and to identify distinct roles of the two areas in 
the planning and execution of movements JT[. Although this study failed to produce clearcut 
results, it remains interesting to try and understand, from a more theoretical point of view, 
how information about mult i- dimensional correlates of neural activity may be transmitted 
from the input to the output of a simple network. In fact, a theoretical study is still lacking, 
which explores how the coding of stimuli with continuous as well as discrete dimensions is 
transferred across a network. 

Information theory has been widely used in the theory of communication, in presence 
of both binary [|3], |], |5| and linear |], [7|] or weakly non-linear || channels. Moreover it has 
been recently proposed as an effective tool to explore the coding properties of neurons (see 
for example || |TD|, PJ), via both direct estimates from real data (for a review see (JTIJ) and 



pure theoretical modelling |T^, [T5], [IB], [T7], [IB 

The mutual information provides a quantitative and flexible measure of the efficiency 
of single cells or of population of cells in coding external stimuli and events relevant to 
behaviour: high values of the mutual information are obtained when the correlates can 
be discriminated with a small uncertainty on the basis of the neural responses; moreover 
the same formalism can be adapted to explore different types of code, from simple time- 
averaged rates, to more sophisticated descriptions, where the exact temporal sequence of 
action potentials is considered to be relevant. 



In a previous report [|13| the mutual information between the time averaged rates of 
a finite population of N units and a set of correlates, which have both a discrete and a 
continuous angular dimension, has been evaluated analytically in the limit of large noise. 
This parameterization of the correlates can be applied to movements performed in a given 
direction and classified according to different "types"; yet it is equally applicable to other 
correlates, like visual stimuli characterized by an orientation and a discrete feature (colour, 
shape, etc.), or in general to any correlate which can be identified by an angle and a 
"type". In this study, we extend the analysis performed for one population, to consider two 
interconnected areas, and we evaluate the mutual information between the firing rates of a 
finite population of M output neurons and a set of continuous+discrete stimuli, given that 
the rate distribution in input is known. In input, a threshold nonlinearity has been shown 
to lower the information about the stimuli in a simple manner, which can be expressed as 
a renormalization of the noise How does the information in the output depend on the 
same nonlinearity? How does it depend on the noise in the output units? Is the information 
transmission from input to output sensitive to the structure of the correlate, whether discrete 
or continuous? 

We address these issues by calculating the mutual information, using the replica trick and 
under the assumption of replica symmetry (see for example fl9fl). 

Saddle point equations are solved numerically. We analyze how the information trasmis- 
sion depends on the parameters of the model, i.e. the level of output and input noise, on 
the ratio between the two population sizes, as well as on the tuning curve with respect to 
the continuous correlate, and on number of discrete correlates. 
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The input-output transfer function is a crucial element in the model. Many earlier theo- 
retical and simulation studies |2(J have mainly focused on binary and the sigmoidal functions; 



yet more recent investigations ETI, |22| have shown that the current-to-frequency transduction 



typical of real neurons is well captured, away from saturation, by a threshold-linear func- 
tion. Such a function combines the threshold of real neurons, the linear behaviour typical 
of pyramidal neurons above threshold, and the accessibility to a full analytical treatment 



14], £3|], as demonstrated here, too. For the sake of analytical feasibility, however, we take 
the input units to be purely gaussian. Therefore it should be kept in mind, in considering 
the final results, that the threshold nonlinearity is only applied to the output units. 

II. THE MODEL 



In analogy to the model studied in |T3 we consider a set of N input units which fire to an 



external continuous+discrete stimulus, parameterized by an angle $ and a discrete variable 
s, with a gaussian distribution: 



v 1 



P({Vj}\$,s) = II ^=f exp ~ {(Vj -Vj($iS)) 2 /2a 2 ] ; (1) 



rjj is the firing rate in one trial of the j th input neuron, while the mean of the distribution, 
s) is written: 

fj j (#,s) = eifj j ($) + (l-eiW; (2) 
W-^)=r ] cos 2m (—^); (3) 
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where e{ is a quenched random variable distributed between and 1, $° is the preferred 
direction for neuron i. According to eq.(|2]) neurons fire at an average firing rate which 
modulates with $ with amplitude e s , or takes a fixed value 7/, independently of with 
amplitude 1 — e s . 

We assume that quenched disorder is uncorrelated and identically distributed across units 
and across the K discrete correlates, and that for each neuron all preferred directions are 
equally likely: 

Q{{e\})=T\Q{z l s ) = W)] NK (4) 

i,s 



(27T) 



N ' 



In Jl3| it has been shown that a cosinusoidal shaped function as in eq.(U) is able to 
capture the main features of directional tuning of real neurons in motor cortex. Moreover it 
has been shown that the presence of negative firing rates in the distribution ([]]), which is not 
biologically plausible, does not alter information values, with respect to a more realistic choice 
for the firing distribution, in that it leads to the same curves except for a renormalization of 
the noise. 

Output neurons are connected to input neurons via uncorrelated gaussian connection 
weights Jij. Each output neuron performs a linear summation of the inputs; the outcome is 
distorted by a gaussian distributed fast noise Si and then thresholded, as in the following: 

&=[£° + £%<V/i + £i] + ; i = l..M,j = l..N (5) 
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In eq.fl5|)£? is a threshold term, Cy is a (0,1) binary variable, with mean c, which expresses 
the sparsity or dilution of the connectivity matrix, and 

(G/y) a >=<3; (Jy> = 0; (6) 

((5 4 ) 2 ) = ^ (*>=0; (7) 

p(cy = l) = c; 

p(cy = 0) = 1-c; (8) 
= x6(x). (9) 

III. ANALYTICAL ESTIMATION OF THE MUTUAL INFORMATION 

We aim at estimating the mutual information between the output patterns of activity 
and the continuous+discrete stimuli: 

/({&}, *®«) = (E f*» f II <W, «M{6}lfr a) log 2 P({ f^I s) \ ; (10) 

piikM') = J U dl JiP({^i}\{vi})p({njMs); (ii) 

where the distribution is determined by the threshold linear relationship (|5|), 

P({Vj}Wi s ) is given in eq.(P and {..) e ,ti°,c,J,6 is a short notation for the average across the 
quenched variables {s l s } ,{■/#} an d on the fast noise {(^}. 
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Contrary to the other quenched variables, {^1} ,1^} ,{Jij} ,{cij} , the variable Si in eq.(|5|) 
is annealed: integration of relationships (|5]) across a zero mean gaussian distribution of 5, 
with variance cxf yields a gaussian distribution in & with variance erf. We assume that the 
stimuli are equally likely: s) = 1/2ttK. 

Eq . (|T0D can be written as: 



/({&}, * ® «) = a)>^ - #({&}) 



(12) 



with: 



(H({ZiMs))o,s = (E Ud^,s)p({^,s)\og 2P ({^,s)) ; (13) 

\ s J J i I e^o c JS 



#({&}) = (E ld^jY[d^,s)p({^,s) 



log 2 



e w(^op({&}Kso 



(14) 



;d°,c,J,5 



The analytical evaluation of (H({^i}\{}, s))^ s can be performed inserting eq.([TI|) in the 
expression flT3|), and using the replica trick [l!| to get rid of the sums under logarithm; since 
these sums already multiply the logarithm, all replica indexes run from 1 up to n + 1: 



i 



lim 

n^o n In 2 



s •?> 3> a i •?> ' e,i}°,c,J,6 
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To take into account the threshold-linear relation (|)j) we consider the following equalities: 



P POO 

/ d&IIp&W}) = n^te = oM}) + / = 

/0 ^ /-co ^ 

n^w-^-E^-^ -«?)+/ ^n^-tf-E^-w -^). (i6) 
-oo JO 



Inserting eq. fljjp in eq.fllSl) one obtains: 



(^({6}Ka)>^ = IE / *W a ) J Rd V « (t[ P ( V «\V,s) 

1 \ s i,a \j,a 

n f nc(nw-^-E^-^ 

J— CO \ „ 



+ r (n - # - e cy-w - 

•'0 \ n 



c,J,S. 



,J,S 
\ 



(17) 



The average across the quenched disorder c,J,5 in eq.([T7|) can be performed in a very 
similar way as shown in [22|: using the integral representation for each 5 function, gaussian 
integration across J,S is standard; the average on {cij} can be performed assuming large the 
number N of input neurons, that is for very small c. 

The final outcome for s))# s reads: 



<ff({&}iM>*,. = H-h (e / / n^ a (np«i^ s ) 

\ s j,a \j,a 



■£><> 



+ f°# /n^e-K 2 / 2 )Sj- a ) 2 e -( c 4/2iv)E Q ,^ a ^E^K e -^o)E^ a 



I AT 



1 (18) 
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where we have put c C /N. 

Integration on {x a } is straightforward. Integration on {rjf } can be performed introducing 
(n + l) 2 auxiliary variables z a p = J2j VfVj v i a $ functions expressed in their integral 
representation. Considering the expression ([!]) for the input distribution and with some 
rearrangement of the terms the final result can be expressed as: 



mm*,*))* 



lim 



n 



dz, 



a/3 



Y[dz af3 e N ^ 



, %0 z a pz<x3 g-f- Tr InS 



e 2 



rr ^!_ e -E„ >/3 « a -ft)(^/2)«"-co) 

"DO rv V27T 



+ 



r -r^(G-i/2)(t-z y 



o (2ttH 



- 1 



(19) 

N 

e,0° 



where: 



S a /3 = 5 a/ 3 + 2a 2 iz a/3 ; 



G a /3 — CTg^aP + Ca 2 jZ a p. 



(20) 



The evaluation of H eq.(|Tj), can be carried out in a very similar way, introducing 
replicas in the continuous+discrete stimulus space. The final result reads: 



Tr lnS 



n-onln2 \J **2ir/N J ^ 

J] / dtit.Mn+t [p(0, S)] n+1 / e -E Q , /3 (^- s ^)^(^.^)'?(^.^)/ 2 - S 



A? 



e 2 



Tr InG 



TT^ e -^"^ (r - ?o)(G ^ /2)(C/5 - fo) + /°° 



(21) 



^_ P -E„ >/3 (G^/2)(r&) 2 



n + l 



Ai 
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IV. REPLICA SYMMETRIC SOLUTION 



The integrals in eq. (O) , (pTT) cannot be solved without resorting to an approximation. In 



analogy to what is used in |jT^, |22]j , we use a saddle-point approximation (which in gen- 
eral would be valid in the limit M, N — > oo) and we assume replica symmetry [1BJ in the 
parameters {z al 3}, {z a! 3}. This allows to explicitely invert and diagonalize the matrices 

z aa = z (n); z a -tp = z^n); 

iz aa = z (n); iz a ^ = -Zi(n)\ (22) 



In p2[ it has been shown that the replica symmetric solution for the information trans- 
mitted by a threshold-linear net is stable in most of the phase diagram. A detailed study of 
the stability of the RS solution in the specific case of mixed continuous and discrete stimuli 



will be presented elsewhere f2M. The saddle point approximation seems to have more subtle 



implications in the present situation, as it will be discussed in the next section and in the 
final discussion. 

In replica symmetry the mutual information can be expressed as follows: 



/({&}, 0® s) = lim 1 J e ^[(n+l)^^-n(n+l)% A ^-i(TrlBG(^ ! ^)+F( Zo A ,^))-iTrlnE(2 ^ ) ^)-^( 

n->o n In 2 I 



-e 
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with 



F(z , Zl )=-21n 



J-oo V" v/2tt Jo 



^L e -E ,^/2)(t&) 2 



(2tt; 



n + l 



H A (z ,Z!) = -—In 



iV 



^ / d$p($,s) (e 



•E„ >/3 (*«/9- E ai) j j(».*) 2 /2« 



iV 



•,0° 



(24) 
(25) 



fr^,fi)=--iii 



/ ^i..^ n+ i[p(A s )] n+i ^ e -E a ^(*«/3-^)«(*-.»»)^.»/j)/^ 



Sl..S n +l 



N 



(26) 



We have set r = ^ and Zq' b ,Zq' b ,z± ' B ,Zi' B are the solutions of the saddle point equations: 



A.B 

V 



A.B 



~A,B 
V 



~A,B 
Z\ 



_d_ 

dz 



Tr In E^q,^) +H A ' B (z Q ,z 1 ) 



1 d 



-Tr In £(So,*i)+# AB (zb,zi) 



n 
9 r 

— - [Tr In G( 20,21) + F(z ,^i)] ; 
n oz\ 2 



(27) 



All the equations must be evaluated in the limit n — > 0. It is easy to check that all terms 
in the exponent in eq.fl2"3|) are order n. In fact, since when n — > only one replica remains, 
one has: 



lim Tr In G(z , Zi)+F(z , zA = 0^ 

Tr\nG(z 0} z 1 ) + F(z ,z 1 ) ~ n^-[Tr In G(z , z x ) + F(z , 2i)]|„ =0 - (28) 

on 
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Therefore, from the saddle point equations, order n and Tr In £ is also order n: 



d 

TrlnE ~ n—Tr In E| n=0 . (29) 



Since z$' B = nz ' , it is easy to check by explicit evaluation that, when n — > 0, all the 
n+1 diagonal terms among the matrix elements {5 a p — are order n and all the n{n + 1) 



out-of-diagonal terms are order 1. Then all terms in the exponent of eqs.(P5f), (P6|) are order 
n, and we can expand the exponentials, which allows us to perform the quenched averages 
across {e, $ }. Considering the expression of 77(1?, s), eq.@, one obtains: 



H A (z£,z?) ~ n(l^-^)Aj; 



H B {z*, z B ) ~ n f (lo - 2?)Aj + 1 + ^5g5 i A J " A ?J J 5 ( 30 ) 



s J 

= [rff [(A 2 + a 2 - 2aA 1 )(e 2 } £ + a 2 + 2a(A 1 - a)(e) e 



(31) 



si,s 2 J 

= (V°) 2 [(A, - a) 2 (^(^ + ^(e 2 ) e ) + a 2 + 2a(A 1 - a)(s) £ 



(32) 



A 1 



^ 2m 



2 2r 



\ m J 



A, 



2 4r. 



^ 4m 



1 2m y 



a 



(33) 
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A similar expansion in n for Tr In H(zq,zi) and for Tr In G(zq, Z\) + F(z ,zi) allows to 
derive explicitely the saddle point equations: 



A,B 

V 



~A,B 



a 2 + M 



2^f . , 
1- A ■ 

2a 4 z B 
2a 2 z B + 1 



+ 



-Caf- < a 



(1 + 2a 2 ~zff 



— —erf 

.VP + Qj {p + qf 2 P 



— A • 

-2aH?f 

e \ 



+ 



Dt 



1 + In erf 



Vp + qJ 

Vp , 



3 

J)2 



e-t 



q + p 



(34) 



where: 



Dx' = / dx'a(x'); a(x) = — e~~; 

-oo J—oo \2n 

p = a 2 s + Ca 2 j(z - z x ); q = Ca)z 1 . 
From the expression of Zq ,b in eg. (|34"D , it is easy to verify that the dependence on z$' B 



in eg. (|30|) , which might affect the information in eg. (|23|) , cancels out with the products 
ZqZ ,ZqZ which should contribute to the information in the limit n — > (see eg.(p3|)). 
Therefore, since Zq' b is known and zf' B depends only on z±' B , the mutual information can 
be expressed as a function of zf ,B ,zf ' B , which in turn are to be determined self-consistently 
by the saddle point eguations. 
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The average information per input cell can be written, finally: 



N 



with 



In 2 



{zfzf 



zfzf + r 



rxCtf, z^-v^zt zt)] + rf (if ) - T^(zf)} 



(35) 



ri(zb,zi) = -o r — — -r + - In pert 



.Dt erf 



In 



erf 



(36) 



r^ A ) = +2 Mi + 2a 2 ^) - 5> 2 + a;; 



1 



zB 



rf (zf ) = - ln(l + 2a 2 5f ) - 2? (<x 2 + AJ) + - 



+ 2a 2 5f 



a; 



A 



(37) 
(38) 



The expression for the mutual information only contains terms linear in either N or M. 
Since the last of the saddle-point equations, fl3~4|), contains r, if one fixes A" and increases M 
the information grows non-linearly, because the position of the saddle point varies. It turns 
out that, as shown below, the growth is only very weakly sublinear, at least when M < N . 
Analogously, fixing M and varying A" we would find a non-linearity due to the r-dependence 
of the saddle point. If r is fixed and A" and M grow together, the information rises purely 
linearly. 

What our analytical treatment misses out, however, is the nonlinearity required to appear 
as the mutual information approaches its ceiling, the entropy of the stimulus set. The 
approach to this saturating value was described at the input stage [|l^, [16]], where also the 
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initial linear rise (in N) was obtained in the large noise limit (13], I5(| . Our saddle point 
method is in same sense similar to taking a large (input) noise limit, a — > oo, to its leading 
(order N/a 2 ) term. It is possible that the saddle point method could be extended, to 
account also for successive terms in a large noise expansion. This would probably require 
integrating out the fluctuations around the saddle point, but by carefully analysing the 
relation of different replicas to different values of the quenched variables. We leave this 
possible extension to future work. The present calculation, therefore, although employing a 
saddle point method which is usually applicable for large N and M, should be considered 
effectively as yielding the initial linear rise in the mutual information, the one observed with 
M small. 

V. NUMERICAL RESULTS 

Eq.([3§ for has been solved numerically using a Matlab Code. Convergence to self- 
consistency has been found already after 50 iterations with an error lower than 10 -10 . 

Fig.|I| shows the mutual information as a function of the output population size, for an 
input population size equal to 100 cells. This is contrasted with the information in the 
input units, about exactly the same set of correlates, calculated as in [O, by keeping only 



the leading (linear) term in N. In fact, in [13] the mutual information carried by a finite 



population of neurons firing according to eq.(|l|) had been evaluated analytically, in the limit 
of large noise, by means of an expansion in N(r]°) 2 /Aa 2 . To linear order in N the analytical 
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expression for the information carried by N input neurons reads: 



WM, ® a) = ^ (K - A S) 5 ( 39 ) 



where AL A^ are defined, again, as in eqs. fl3l|) , fl32l). In analogy to what had been done 
in [14] we have set Ccr^ = 1- As evident from the graph, also the output information is 
essentially linear up to a value of r ~ 0.5, and quasi-linear even for r = 1. It should be 
remined, again, that our saddle point method only takes into account the term linear in N 
in the information input units carry about the stimulus. It is not possible, therefore, for 
eq.fl35|) to reproduce the saturation in the mutual information as it approaches the entropy 
of the stimulus set (which is finite, if one considers only discrete stimuli). The nearly linear 
behaviour in M thus reflects the linear behaviour in N induced, in the intermediate quantity 
(the information available at the input stage), by our saddle point approximate evaluation. 

As it is clear from the comparison in Fig.|T], when the two populations of units are affected 
by the same noise the input information is considerably higher than the output one. This is 
expected, since output and input noise sum up while influencing the firing of output neurons, 
but also because the input distribution is taken to be a pure gaussian, while the output rates 
are affected by a threshold. If the input-output tranformation were linear and the output 
noise much smaller than the input one, one would expect that output and input units would 
carry the same amount of information. 
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5 10 15 20 



100 



FIG. 1: Information rise, from eq.(^), as a function of the number M of output neurons. N = 100; 
K = 4; (t? ) 2 = 0.1; a = 0.2; £° = -0.4; a 2 = 1; m = 1; Ccr} = 1; of = 1. The distribution g(e) 
in eq.(H) is just equal to 1/3 for each of the 3 allowed e values of 0, 1/2 and 1. The upper curve 



is the linear term in the input information, calculated as a function of N as in [13 1 with identical 
parameters. 



Briefly, in a linear network with zero output noise one has: 



P({Ci}\{Vj}) = ■Y,<\r J U l l.,)-- 



(40) 



Considering eqs.(pT|),([T|), an effective expression for the distribution p({(,i}\$, s) can be 
obtained by direct integration of the 8 functions 5(£i — J2j c ijJijVj) y i a their integral repre- 
sentation, on {f)j}: 



{27i) M detE 



(41) 



&(0,s) =J2 C ij J ij T lj($, S) 



(42) 
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Si,- — O ^ CjkJik c kjJkj'i 
k 



(43) 



This distribution is then used to evaluate both the equivocation, eg. flT3|) , and the entropy 



of the responses, eq.(H). We do not report the calculation, that is straightforward and 
analogous to the one reported in [13]. The final result, which is valid for a finite population 
size M, and up to the linear approximation in M{rp) 2 / '4a 2 , is analogous to eq . (j39[) : 



1 M 
In 2 2a 2 



fA 1 - A 2 



(44) 



Thus, we expect that taking the limits £° —>■ oo and r — > simultaneously in eq . (|35D , we 
should get to the same result: the output information should equal the input one when a 2 
grows large. 

From eq.(PH) it is easy to show that: 



1 M 

lim lim i? ® s) = In 



1 + 



2Ca 2 (A] - A 2 
a 2 , + 2Ca 2 a 2 



(45) 



When a 2 ^> a|,A^,A 2 one obtains exactly the linear limit, eq.([|4]). We have verified this 
analytical limit by studying numerically the approach to the asymptotic value of the mutual 
information. FigfJ shows the dependence of output information on the output noise a 2 , for 
4 different choices of the (reciprocal of the) threshold, £°. A large value, £° = 10, implies 
linear output units. As expected, the output information, which always grows for decreasing 
values of the output noise, for £° = 10 approaches asymptotically the input information. For 
increasing values of the output noise, the information vanishes with a typical sigmoid curve, 
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FIG. 2: Output information, from eq.(|35|), as a function of the output noise <r|, for 4 different 
values of the output (reciprocal) threshold £°. Logarithmic scale. N = 100; K = 4; M = 10; 
(77 ) 2 = 0.1; a = 0.2; a 2 = 1; m=l; Caj = 1. The distribution g(e) in eq.(|) is just equal to 1/3 
for each of the 3 allowed e values of 0, 1/2 and 1. The dotted line represents the asymptotic value 
of the input information, eq. (p9|) , for N = 10. 

with its point of inflection when the output matches the input noise. 

We have then examined how the information in output (compared to the input) de- 
pends on the number K of discrete correlates and on the width of the tuning function (|3|), 
parametrized by m, with respect to the continuous correlate. Fig.[3] shows a comparison 
between input and output information for a sample of 10 cells, as a function of K. Both 
curves quickly reach an asymptotic value, obtained by setting K — > 00 in eq.(^) for A 2 . 
The relative information loss in output is roughly constant with K. A comparison is shown 
with the case where correlates are purely discrete, which is obtained by setting m = in 
eq.(||). The curves exhibit a similar behaviour, even if the rise with K is steeper, and the 
asymptotic values are higher. This may be surprising, but it is in fact a consequence of the 
specific model we have considered, eq.(^), where a unit has the same tuning curve to each 
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FIG. 3: Comparison between input and output information as a function of the number K of 
discrete correlates, for the case of continuous+discrete correlates (m = 1) or with purely discrete 
correlates (obtained by setting m = 0). In eq.([35|) we have set N = 100; r = 0.1; ^° = —0.4; 
(t7 ) 2 = o.l; a = 0.2; a 2 = 1; Caj = 1; erf = 1. The distribution g(e) in eq.® is just equal to 1/3 
for each of the 3 allowed e values of 0, 1/2 and 1. 

of the discrete correlates, only varying its amplitude with respect to a value constant in the 
angle. As K — > oo, most of the mutual information is about the discrete correlates, and 
the tuning to the continuous dimension, present for m = 1, effectively adds noise to the 
discrimination among discrete cases, noise which is not present for m = 0. 

With respect to the continuous dimension, the selectivity of the input units can be in- 
creased by varying the power m of the cosine from (no selectivity) through 1 (very dis- 
tributed encoding, as for the discrete correlates) to higher values (progressively narrower 
tuning functions). Fig.[| reports the resulting behaviour of the information in input and in 
output, for the case K — 1 (only a continuous correlate) and K = 4 (continuous+discrete 



correlates). Increasing selectivity implies a "sparser" |2l]| representation of the angle, the 
continuous variable, and hence less information, on average. However if the correlate is 
purely continuous there is an initial increase, before reaching the optimal sparseness. It 
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FIG. 4: Comparison between input and output information as a function of the selectivity along the 
continuous dimension, which is made sharper by increasing m. K = 1 implies a purely continuous 
correlate, while the continuous+discrete case is obtained by setting K = 4. In eq.(|35|) we have set 
N = 100; r = 0.1; £° = -0.4; {rff = 0.1; a = 0.2; m = 1; Caj = 1; a] = a 2 = 1, in both cases. 
The distribution g(e) in eq.(Q) is just equal to 1/3 for each of the 3 allowed e values of 0, 1/2 and 
1. 

should be kept in mind, again, that the asymptotic equality of the K = 1 and K = 4 cases 
is a consequence of the specific model, eq.(|2|), which assigns the same preferred angle to each 
discrete correlate. The resolution with which the continuous dimension can be discriminated 
does not, within this model, improve with larger K, while the added contribution, of being 
able to discriminate among discrete correlates, decreases in relative importance as the tuning 
becomes sharper. 

Figures | and [| show that, as long as the output noise is non zero and the threshold 
is finite, information is lost going from input to output, but the information loss does not 
appear to depend on the structure and on the dimensionality of the correlate. 

Note that, while the purely continuous case has been easily obtained by setting K = 1 in 
the expression of A^, eq.fl32|), for the purely discrete case it is enough to set m = 0. 
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VI. SIMULATION RESULTS: INFORMATION ESTIMATES VIA DECODING 



One way to test the range of validity of our analytical approximation si via numerical 
estimates of the information. In our specific case where we deal with populations of contin- 
uous neurons and with 4 different sources of quenched disorder, direct numerical integration 
is prohibitive. A more feasible method is to generate simulated data both in input and in 
output and then to estimate the information from the data, resorting to some algorithm. 

Many previous studies (revised in ||) have shown that information estimates from data 
are extremely sensitive to sampling and the distortion is the more serious the larger is the 
space of the possible configurations which has to be sampled. In our case where neurons 
are characterized by their time averaged continuous firing rates, a simple binning of the 
responses into discrete distributions results in a considerable distortion, which becomes even 
more serious since a sparse sampling is required to perform the average across the four 
quenched distributions. 

Several procedures have been proposed to correct the bias affecting information estimates; 
some of them (see ||) are based on regularizations like binning or smoothing; other ones 
rely on more theoretical approaches aiming at providing an analytical expression for the 



correction [[25 



When the responses vary in a continuum and the population size is very large it has 
been suggested that the best estimate is obtained via decoding: the method consists in 
generating a predicted stimulus s ; (rk) from each simulated response vector Vk{s) in each trial 
k by matching r&(s) to the average response vector r av (s'), for all stimuli. The predicted 
stimulus will be the one corresponding to the best match. Summing on all the trials one can 
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derive a probability table p(s,s')\ then, the mutual information between the true and the 
pseudo-stimuli is computed instead of the original one, between stimuli and responses: 

From a theoretical point of view the optimal transformation to derive the probability 
p(s'\r) is defined by Bayes rule (Bayesian decoding): 



p(r\s)p(s) = p(r)p(s\r); (47) 

In our specific case the input distribution is defined in eq. ([!]), so that relationship ( |4"TD 
can be explicitely inverted; on the contrary the output distribution is not explicitly known 
and one has to fit some function to the responses to be able to invert the relationship (|47|). 
Further details about these procedures can be found in p, |2T) . 

Fig.|5] shows the results of simulations for a sample of 20 output cells receiving from 
1000 input cells. A MATLAB code has been used to generate data in an amount of 2000 
responses per neuron per stimulus, where we have considered the simpler case of 4 purely 
discrete stimuli (m = in eq.|3|). The decoding in output has been optimized choosing a 
high value for the threshold £°, so that the single cell output distribution, as derived from 
the simulated data, could be roughly fitted by a gaussian. For each population size we have 
chosen many random subsamples of units out of the whole population, both in input and 
in output, and we have then averaged the information value across subsamples. Quenched 
averages have been performed resorting to a sparse sampling. 
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The plot shows that the curves obtained via simulations (dashed lines) match the ana- 
lytical prediction (solid lines) for a very small number of cells, but they deviate when the 
number of cells increases; since both the input and the output noise are large and the infor- 
mation values are much lower than the upper bound of 2 bits, this mismatch cannot be due 
to a deviation from linearity close to the ceiling regime. It is more probably an effect due 
to the distortion caused by the decoding, which is known to increase with the population 
size. The discrepancy between the true and the decoded information has also been recently 
investigated and quantified analytically f2~?fl . 

The analytical approximation seems to have a wider applicability in output, where even 
for a population size larger than 4-5 cells the analytical curve falls within the error. This is 
mainly an effect due to the larger error characterizing the output information, which in turn 
is due to the stronger fluctuations in the information values across the different subsamples 
of cells. We have checked that the analytical results are always confirmed for a population 
size of 1 or 2 cells varying the number of discrete correlates and the value of the input and 
output noise. 

We certainly cannot conclude from this that our analytical approximation is not valid 
for population sizes larger than 2-3 cells. On the contrary we are currently devising new 
algorithms to reduce the bias in the information estimates and preliminary results seem to 
suggest a much wider range of validity of the analytical approximation. A more detailed 
study comparing decoding to other algorithms will be published elsewhere ||28|| . 
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FIG. 5: Comparison between the analytical approximation (solid lines) and simulations (dashed 
lines) for the input and the output information. N = 1000; K = 4; r = 0.02; £° = 2; (r/ ) 2 = 0.1; 
a = 0.1; m = 0; Coj = 1; a 2 = a 2 = l.The distribution g(e) in eq.(||) is just equal to 1/2 for each 
of the 2 allowed e values of and 1 

VII. DISCUSSION 



We have attempted to clarify how information about multi-dimensional stimuli, with 
both a continuous and a discrete dimension, is transmitted from a population of units with 
a known coding scheme, down to the next stage of processing. 

Previous studies had focused on the mutual information between input and output units 



in a two-layer threshold-linear network either with learning [JT4] or with simple random 



connection weights [22 



More recent investigations have tried to quantify the efficiency of a population of units in 



coding a set of discrete [IB) or continuous |Tj| correlates. The analysis in [15] has been then 
generalized to the more realistic case of multi-dimensional continuous+discrete correlates 
0- 



This work correlates with both research streams, in an effort to define a unique conceptual 
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framework for population coding. The main difference with the second group of studies is 
obviously the presence of the network linking input to output units. The main difference 
with the first two papers, instead, is the analysis of a distinct mutual information quantity: 
not between input and output units, but between correlates ("stimuli") and output units. In 



15| it had been argued, for a number K of purely discrete correlates, that the information 
about the stimuli reduces to the information about the "reference" neural activity when 
K — > oo. The reference activity is simply the mean response to a given stimulus when the 
information is measured from the variable, noisy responses around that means; or it can be 
taken to be the stored pattern of activity, when the retrieval of such patterns is considered, 



as m 



Hjj. True, the information about the stimuli saturates at the entropy of the stimulus 



set, but for K — > oo this entropy diverges, only the linear term in N is relevant [fT5 |, and the 
two quantities, information about the stimuli and information about the reference activity, 
coincide. 

Our present saddle point calculation is only able to capture, effectively, the mutual infor- 
mation which is linear in the number of input units, as mentioned above. It fails to describe 
the approach to the saturating value, the entropy of the set of correlates, be this finite or 
infinite. Therefore, ours is close to a calculation of the information about a reference activity 
- in our case, the activity of the input units. The remaining difference is that we can take 
into account, albeit solely in the linear term, the dependence on K (through the equation 
for eq.fl32|)), without having to take the further limit K — > oo. 

Due to the presence of a threshold and of a non zero output noise the information in 
output is lower than that in input, and we have shown analytically that in the limit of a 
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noiseless, linear input-output transfer function the ouptput information tends asymptotically 
to the input one. We have not, however, introduced a threshold in the input units, which 
would be necessary for a fair comparison. In an independent line of research, recent work 
H has also quantified the contribution to the mutual information, in a different model, of 
cubic and higher order non-linearities in the transfer function, by means of a diagrammatic 
expansion in a noise parameter. In it has been shown that the effect of a threshold in 
the input units on the input information results merely in a renormalization of the noise. 
The resulting effect on the output information remains to be explored, possibly with similar 
methods. 

Considering mixed continuous and discrete dimensions in our stimulus set, we had been 
wondering whether the information loss in output depended on the presence or absence of 
discrete or continuous dimensions in the stimulus structure. We have shown that for a fixed, 
finite level of noise this loss does not depend significantly on the structure of the stimulus, 
but solely on the relative magnitude of input and output noise, and on the position of the 
output threshold. 

Our analytical efforts have been also motivated by the difficulty to perform a simulation 
study for this specific scheme of continuous rate coding in presence of several distinct sources 
of quenched disorder. Nonetheless we have performed some simulations using a decoding 
procedure to estimate the information from simulated data both in input and in output. 
Our results confirm previous findings in that the distortion due to decoding grows with the 
population size, so that the simulations confirm the anlytical prediction only for a very small 
number of cells. We are currently devising new computational methods to improve the match 
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between the analytical and the simulation results. 

Further developments of this analysis include the evaluation of the output information in 



presence of learning, in line with [H|, and with correlations in the firing of input units. 

A recent work has shown that the interplay between short and long range connectivities 
in the Hopfield model leads to a deformation of the phase diagram with the appearence of 



novel phases p9 |. It would be interesting to introduce short and long range connections 
in our model, and to examine how the coding efficiency of output neurons depends on the 
interaction between short and long range connections. This will be the object of future 
investigations. 
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